Goto

Collaborating Authors

 pairwise comparison model


Hypothesis Testing for Generalized Thurstone Models

Makur, Anuran, Singh, Japneet

arXiv.org Machine Learning

In this work, we develop a hypothesis testing framework to determine whether pairwise comparison data is generated by an underlying \emph{generalized Thurstone model} $\mathcal{T}_F$ for a given choice function $F$. While prior work has predominantly focused on parameter estimation and uncertainty quantification for such models, we address the fundamental problem of minimax hypothesis testing for $\mathcal{T}_F$ models. We formulate this testing problem by introducing a notion of separation distance between general pairwise comparison models and the class of $\mathcal{T}_F$ models. We then derive upper and lower bounds on the critical threshold for testing that depend on the topology of the observation graph. For the special case of complete observation graphs, this threshold scales as $Θ((nk)^{-1/2})$, where $n$ is the number of agents and $k$ is the number of comparisons per pair. Furthermore, we propose a hypothesis test based on our separation distance, construct confidence intervals, establish time-uniform bounds on the probabilities of type I and II errors using reverse martingale techniques, and derive minimax lower bounds using information-theoretic methods. Finally, we validate our results through experiments on synthetic and real-world datasets.


Minimax Hypothesis Testing for the Bradley-Terry-Luce Model

Makur, Anuran, Singh, Japneet

arXiv.org Artificial Intelligence

The Bradley-Terry-Luce (BTL) model is one of the most widely used models for ranking a collection of items or agents based on pairwise comparisons among them. In this work, our objective is to formulate a hypothesis test that determines whether a given pairwise comparison dataset, with k comparisons per pair of agents, originates from an underlying BTL model. We formalize this testing problem in the minimax sense and define the critical threshold of the problem. We then establish upper bounds on the critical threshold for general induced observation graphs (satisfying mild assumptions) and develop lower bounds for complete induced graphs. In particular, our test statistic for the upper bounds is based on a new approximation we derive for the separation distance between general pairwise comparison models and the class of BTL models. To further assess the performance of our statistical test, we prove upper bounds on the type I and type II probabilities of error. Much of our analysis is conducted within the context of a fixed observation graph structure, where the graph possesses certain "nice" properties, such as expansion and bounded principal ratio. Finally, we conduct several experiments on synthetic and real-world datasets to validate some of our theoretical results. Moreover, we also propose an approach based on permutation testing to determine the threshold of our test in a data-driven manner in these experiments. In recent years, the availability of pairwise comparison data and its subsequent analysis has significantly increased across diverse domains. Pairwise comparison data consists of information gathered in the form of comparisons made among a given set of items or agents. Many real-world applications, including sports tournaments, consumer preference surveys, and political voting, generate data in the form of pairwise comparisons. Such datasets serve a range of purposes, such as ranking items [2]-[12], analyzing team performance over time [13], studying market or sports competitiveness [14], [15], and even fine-tuning large language models using reinforcement learning from human feedback [16], [17]. A popular modeling assumption while performing such learning and inference tasks with pairwise comparison data is to assume that the data conforms to an underlying Bradley-Terry-Luce (BTL) model [2]-[6] as a generative model for the data. P(i is preferred over j) = . The BTL model is known to be a natural consequence of the assumption of independence of irrelevant alternatives (IIA), which is widely used in economics and social choice theory [3].


Statistical inference for pairwise comparison models

Han, Ruijian, Tang, Wenlu, Xu, Yiming

arXiv.org Machine Learning

Pairwise comparison involves assessing subjects in pairs to establish their relative preferences, a concept relevant to numerous applications such as sports analytics [13, 20, 30], econometrics [23], video coding [19], and social science [22], to name just a few. A prevalent method for pairwise comparison modeling employs a latent score framework. Originated from the ideas in Thurstone [29] and Zermelo [33], a mathematical model for pairwise comparison data analysis was proposed in the work by Bradley and Terry [4]. Subsequent developments led to multiple generalizations, including ordinal models such as the Rao-Kupper model [26] and the Davidson model [10], which account for ties, the cumulative link model [1] that considers more refined ordinal scales, and cardinal models such as the paired cardinal model [27]. We recommend [5] for a review of the related topics in pairwise comparison modeling. Given the growing number of subjects in the big-data era, recent research trends focus on understanding the asymptotic behavior of estimating the latent score vector as the number of compared subjects approaches infinity.


A General Pairwise Comparison Model for Extremely Sparse Networks

Han, Ruijian, Xu, Yiming, Chen, Kani

arXiv.org Machine Learning

Statistical inference using pairwise comparison data has been an effective approach to analyzing complex and sparse networks. In this paper we propose a general framework for modeling the mutual interaction in a probabilistic network, which enjoys ample flexibility in terms of parametrization. Within this set-up, we establish that the maximum likelihood estimator (MLE) for the latent scores of the subjects is uniformly consistent under a near-minimal condition on network sparsity. This condition is sharp in terms of the leading order asymptotics describing the sparsity. The proof utilizes a novel chaining technique based on the error-induced metric as well as careful counting of comparison graph structures. Our results guarantee that the MLE is a valid estimator for inference in large-scale comparison networks where data is asymptotically deficient. Numerical simulations are provided to complement the theoretical analysis.